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Abstract. MADNESS (multiresolution adaptive numerical environment for scientific simulation) 
is a high-level software environment for solving integral and differential equations in many dimen¬ 
sions that uses adaptive and fast harmonic analysis methods with guaranteed precision based on 
multiresolution analysis and separated representations. Underpinning the numerical capabilities is a 
powerful petascale parallel programming environment that aims to increase both programmer pro¬ 
ductivity and code scalability. This paper describes the features and capabilities of MADNESS and 
briefly discusses some current applications in chemistry and several areas of physics. 


1. Introduction. Numerical tools are ubiquitous in modern science since the 
relevant models/equations do not generally have analytical solutions. Although ad¬ 
vances in scientific computing over the last thirty years have enabled more sophisti¬ 
cated models and the quantitative simulation of large-scale problems, these numerical 
methods introduce new, unphysical considerations that must also be taken into ac¬ 
count. For instance, function representation, that is, the choice of basis set, is critically 
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important. If a basis set does not adequately resolve the finest details of the system, 
the calculated solutions will likely be inaccurate and have questionable physical in¬ 
terpretation. Other numerical factors include computational efficiency, scalability, 
distributed memory management, differentiation schemes, and quadrature rules. As 
such, most computational approaches force the scientist or engineer to compose soft¬ 
ware in terms of details inherent to the representation (e.g., the coefficients of basis 
functions or integrals within the basis set), rather than at the level of mathematics 
or physics that uses the natural functions or operators of the problem domain. In 
the end, all of these considerations obscure the desired science and force scientists to 
instead focus on computational mundanities. 

Several software packages have been developed to help insulate scientists from 
these issues; two notable examples are Trilinos [28] and PETSc [9]. In essence, these 
and similar frameworks support common scientific computing operations, e.g., lin¬ 
ear algebra algorithms, mesh generation (function representation), nonlinear solvers, 
and ordinary differential equation integrators. These packages also typically scale to 
parallel or massively-parallel machines, facilitating the solution of large-scale prob¬ 
lems. Unfortunately, to our knowledge, all of these packages require the scientist to 
“think” in vectors and matrices, instead of functions and operators. Moreover, with 
an emphasis on engineering applications, computation is typically restricted to three 
or fewer dimensions. 

Our goal in this communication is to motivate and overview the MADNESS 
(multiresolution adaptive numerical environment for scientific simulation) package, 
which offers the following capabilities to users: 

1. The illusion of basis set-free computation. Under this illusion, the user com¬ 
putes with functions and operators instead of vectors and matrices. Behind the scenes, 
functions and operators are expanded in a multi-wavelet basis, where the number of 
basis functions is adapted to guarantee the precision of each operation. As necessary, 
the “mesh”, which is the support of the basis functions, is refined in computationally- 
troublesome regions (perhaps those with fine details) and coarsened in others, enabling 
the simulation of multi-scale problems. 

2. Fast and arbitrarily-accurate methods for solving differential and integral 
equations in one to six dimensions (perhaps more), with specified boundary condi¬ 
tions. These operations include, for example, linear algebra, numerical differentia¬ 
tion/integration, and, perhaps most important, integral convolution kernels, such as 
the Poisson equation Green’s function. 

3. A parallel runtime that allows the user to compose algorithms as a dynami¬ 
cally scheduled set of tasks that operate on objects in global namespaces. 

4. Algorithms and implementations that scale to massively-parallel computa¬ 
tional resources, thereby enabling the solution of large problems. 

We aim, by providing these tools, to raise the level of composition of scientific 
applications, making it faster and easier to both (i) construct robust and correct al¬ 
gorithms and (ii) calculate solutions to new and existing problems. Although MAD- 
NESS’s breakthrough initial application was in computational chemistry [26, 50, 51], 
its framework more generally represents functions, operators and boundary condi¬ 
tions. Thus, it is readily applicable to describe and solve well-defined equations and 
boundary conditions. Extensive applications include, to date, atomic and molecular 
physics, electrostatics, materials science, nanoscience, nuclear physics, solvation mod¬ 
els, and graph theory [21, 22, 30, 39, 40, 46, 49]. We hope that this introduction to 
MADNESS conveys its generality and encourages its application to new domains. 
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The layout of this communication is as follows. Section 2 introduces the basics 
of multiresolution analysis (MRA), which is the mathematical underpinning of MAD¬ 
NESS, and how MRA facilitates scientific computing. We then provide and discuss, in 
§3, some simple examples of programming in MADNESS that illustrate the absence 
of basis set considerations and the ability to write code in terms of functions and op¬ 
erators. Then, §4 summarizes the computational details of MADNESS, including the 
runtime environment/framework and the methods behind MADNESS’s utilization of 
(massively-)parallel architectures. Finally, §5 summarizes this report and describes 
future directions for MADNESS. 

2. Numerics: Mathematical Overview &: Capabilities. With the illusion 
of basis free computation, a MADNESS user simply needs to specify the desired accu¬ 
racy; from this, the numerical framework provides fast computation with guaranteed 
accuracy (in up to 6 dimensions). This is accomplished through several key mecha¬ 
nisms: 

1. The underlying representation is an orthonormal basis [7]; functions are rep¬ 
resented on an adaptive mesh with a discontinuous spectral-element basis set. Within 
each adaptively-refined element the basis functions are tensor products of Legendre 
polynomials. Crucially, the application programmer only uses the function as a single 
entity and never sees or manipulates the basis or mesh. 

2. Guaranteed accuracy for each operation ( e.g ., multiplication or addition of 
functions, application of an operator) is accomplished through automatic refinement 
of the mesh using criteria derived from MRA [7]. 

3. Fast and accurate computation is enabled through MRA that separates be¬ 
havior between length scales. MRA provides a sparse hierarchical (tree) decomposi¬ 
tion of functions and operators that enables automatic refinement or coarsening of 
the mesh to meet the accuracy goal. MRA also yields fast algorithms for physically 
important operators [7, 12, 20]. 

4. Fast and accurate computation beyond one dimension is made feasible through 
accurate and compact representations of operators as sums of separable kernels (usu¬ 
ally Gaussians), which are created through quadrature or more advanced numerical 
techniques [12, 13, 25]. 

In the remainder of this section, we elaborate on these points. 

2 . 1 . Underlying basis and adaptive mesh. Inside MADNESS (using a ID 
example here, for simplicity), the user’s computational domain is mapped to [0,1]. 
This domain is repeatedly subdivided by factors of two such that, at level n, there are 
2" boxes, each of size 2~ n . Within each box, we use the first k Legendre polynomials 
as the basis (or scaling functions in the language of MRA [6, 7]). The polynomials 
are scaled and shifted to form an orthonormal basis. 

Specifically, the i th scaling function (i = 0,..., fc—1) in the I th box (1 = 0,..., 2 n — 
1) at level n (n = 0,1,...) is 

tf i (x) = 2 n '%(2 n x-l), 


where 



V2iTlPi(2x- 1), i6(0,l) 
0, x (0,1) 


( 2 . 1 ) 


is the i th mother scaling function. Projection of a function f(x) into the basis at level 
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n is accomplished by orthogonal projection, 


li 


(2.2a) 


s 


n _ 

li 


d xfixWZix) 


(2.2b) 


Projection is not performed globally on a uniform fine grid; instead, projection starts 
on a relatively coarse level (n). Within each box on level n we compute the projection 
at level n and at the coarser level n—1. If the difference between these projections 
is not satisfactory, the projection is locally repeated at a finer level (n + 1), and so 
forth until the projection is sufficiently accurate. The differences in representations 
between successive levels are efficiently and accurately computed in the wavelet basis, 
which we now describe. 


2.2. Multiresolution analysis. Rather than thinking about the approximation 
of a function at each length scale, MRA focuses upon the difference between length 
scales and, by doing so, exploits both smoothness and sparsity for fast computation. 
The telescoping series clearly illustrates this. Let V n be the space spanned by the 
polynomials at level n. Then, we can write 

V n = © (V 1 0 V°) © (V 2 © V 1 ) © • • • © (V n © V n ~ l ) , (2.3) 


which decomposes the approximation space at level n in terms of the coarsest ap¬ 
proximation (V°) plus corrections at successive levels of refinement. If the function 
of interest is sufficiently smooth in some volume, then differences at finer scales will 
be negligible in that volume. Thus, smoothness is translated into sparsity and hence 
fast computation; a precise definition of “negligible” enables adaptive refinement to 
guarantee accuracy. The multi-wavelets at level n are defined as an orthonormal, 
disjoint basis that spans the difference space W n = V n+1 © V n and are constructed 
by translation and dilation of the mother wavelets [6, 7]. 

We thus have two representations of our function: (i) its projection onto the finest 
level of the adaptively refined grid (be., in real space or in the scaling function basis) 
and (ii) its representation at the coarsest level plus corrections at successive levels of 
refinement (be., in the multi-wavelet basis). The fast wavelet transform (FWT) [11] 
enables fast and accurate transformation between these two representations. From a 
practical perspective, we simply pick the most convenient basis to compute in, just 
as one would if using the Fourier basis and the fast Fourier transform. However, 
the disjoint support of multi-wavelets enables local adaptive refinement, which is not 
possible in the global Fourier basis. A third equivalent representation is the value of 
the function at the Gauss-Legendre quadrature points within each box of the mesh, 
which facilitates certain operations (such as multiplication of functions). 

We can use a similar approach to understand how to turn smoothness into sparsity 
in applying operators. Let P n and Q n be the projectors onto V n and W n , respectively. 
Then, by construction, P n+1 = P n + Q n . Let T be the kernel of an operator we wish 
to apply and let us assume that level n is adequate to resolve both the input and 
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output functions. The representation of the operator at level n is 


rj-rn _ pTirj-ijpn 


= ( pH- 

_ rj-rn — 1 


1 + Q 71 - 1 ) T (pH- 1 + Q 71 - 1 ) 

+ Q n ~ 1 TQ n ~ 1 + Q n ~ 1 TP n ~ 1 


_|_ pn-lpQn-l 


n— 1 

= t °+J2 ( A ' m + B m + c m ), 

m —0 


(2.4a) 


(2.4b) 


where we define A m = Q m TQ m , B m = Q m TP m , and C m = P m TQ m . In this so- 
called non-standard form [10], T° acts only upon scaling functions at the coarsest 
level whereas the operators applied at finer levels all involve wavelets. 

For a demonstrative example of the computational significance of this approach, 
let us examine convolution with the Poisson kernel [7, 25, 26], T = \x-y\~ 1 . Consider 
the operator A m , which connects wavelets at level m to other wavelets at level m. By 
construction, the first k moments of the wavelets vanish; thus, the multipole expansion 
of T tells us that the matrix elements of A m decay as r _2;c_1 ( e.g ., if k = 8, which 
is routine, they decay as r -17 ). In other words, even though T is dense, A m is very 
sparse to any finite precision. A similar approach shows that B m and C m decay 
as Translationally invariant operators yield Toeplitz representations, which 

are very important for practical computation due to the greatly reduced burden of 
computing and storing the operator. 

2.3. Separated representations. The approach described above is not effi¬ 
cient beyond one dimension because the cost of naively applying an operator grows 
as O ((2k) 2D log(e _1 ) £> ) with similarly excessive memory requirements, where D is 
the dimensionality and e is the desired precision. However, for many physically im¬ 
portant operators we can construct [12, 25, 26] either global or local (via numerically 
generated low-rank factorizations of the full operator) separated representations; that 
is, we can approximate, with controlled precision, the full operator as a practically 
short sum of products of operators that separately apply in each dimension. This 
reduces the cost of application to O ( M(2k) D+1 log^ -1 ) 77 ) and the memory footprint 
to O (MD(2k) 2 log(e -1 )) where M is the number of terms in the sum. Powerful tech¬ 
niques for generating these separated forms include quadrature of integral identities 
[25] and numerical techniques developed by Beylkin and Monzon [13]. For example, 
numerical quadrature to evaluate the integral 


2 r° 

7^ 


2 2i 

dse e 


+ S 


yields compact representations of r 1 as a sum over Gaussians, 


M 

= y]c /i e _t ' ir2 +0(er _1 ) ; (2.5) 

M= 1 

in which the number of terms ( M) scales only as O ((log R) (loge -1 )), where [l,i?] is 
the range of validity and e is the relative precision. 

3. Applications: Programming with MADNESS. Having discussed the 
mathematical underpinnings of MADNESS, we briefly examine some samples of nu¬ 
merical programming with MADNESS. Many additional examples, including quantum 
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chemistry and other applications, are provided in the MADNESS code repository (see 
the directories src/apps/ and src/examples/ in the code [1]). 

3.1. Example: Getting Started with MADNESS. Our first example aims 
to evaluate the integral (trace), 2-norm, and electrostatic self-energy of a Gaussian 
function in three dimensions. 


g(x) = e -|5?|2 (3.1a) 

J d 3 f g{x) = 7r 3/2 = 5.5683279 (3.1b) 

\ 1/2 

d 3 £ g(x ) 2 1 = (tt/ 2) 3 / 4 = 1.403104 (3.1c) 

J d 3 x g[x) J d 3 y \x — y\~ 1 g(y) = 7r 5,/2 2 1,/2 = 24.739429 (3.Id) 

Listing 1 shows the entire MADNESS source code for this task, which we now walk 
through. 

1 
2 

3 

4 

5 

6 

7 

8 
9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

Listing 1 

Code for evaluating the integrals in Eq. (3.1). This code can be found in 
src/examples/gaussian.cc. 

Initially ignoring the boiler-plate code, there are only four lines of significance. 

• Line 15 defines the computational volume ([— 6,6] 3 ). 

• Line 16 constructs the Coulomb operator; in actuality, a separated Gaussian 
representation of the Green’s function for Poisson’s equation with free-space 
boundary conditions, as described in §2.3. The two numerical values in the 
constructor indicate the finest length scale to resolve and the desired accuracy 
of the operator, respectively. 

• Line 17 projects the analytic function (lines 5-8) into the adaptive numerical 
basis using the default precision (e = 10~ 4 ) and wavelet order (k = 6). 

• Line 18 prints the results. 

The integral in (3.Id) (the last printed value) is computed as the inner product (inner) 
of the Gaussian function (g) and its convolution with the Green’s function (op(g)). 


^include <madness/mra/mra. h> 

^include Cmadness/mra/operator . h> 
using namespace madness; 

double gaussian (const coord_3d& r) { 
double x=r [0] , y=r [1] , z=r [2] ; 
return exp(—(x*x + y*y + z*z)); 

} 

int main(int argc , char** argv) { 
initia1 i ze(argc, argv); 

World world (SafeMPI : :(X)MVLWORLD) ; 
startup (world , argc , argv) ; 

FunctionDefaults <3>::set_cubic_cell ( — 6.0,6.0) ; 
real.convolution _3d op = CoulombOperator (world , le— 4, le—6); 
real_funct ion_3d g = real_factory_3d ( world) . f ( gaussian ) ; 
print(g.trace(), g.norm2(), inner(g,op(g))); 

finalize () ; 

return 0; 

} 
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Printed results agree with the exact values to six significant figures, which is more than 
might be expected from the default precision of 10~ 4 . This is a common observation 
and is likely due to the projection into the numerical basis oversampling to guarantee 
the requested precision. 

The main program commences (line 11) by initializing the MADNESS parallel 
computing environment (discussed in §4), which includes initializing MPI (if it has 
not yet been initialized). Next, line 12 creates a World object that encapsulates an 
MPI intra-communicator; this plays the same role as an MPI communicator, but pro¬ 
vides the much more powerful capabilities of the MADNESS runtime. Subsequently, 
line 13 initializes the MADNESS numerical environment. The program concludes by 
terminating the parallel computing environment (line 20), which also finalizes MPI if 
the call to initialize() initialized MPI. 

Finally, the MADNESS framework provides support for common mathematical 
and computational processes. For example, algorithms are implemented for (i) matrix- 
free iterative solvers (such as GMRES [41] and BiCGStab [48]), (ii) converting between 
the multi-wavelet basis and a Fourier basis [29], (iii) solving non-linear systems of 
equations [24], and (iv) outputting MADNESS functions for visualization with various 
tools, including Visit [4], OpenDX [2], and ParaView [27]. 

3.2. Molecular electronic structure. One of the initial targets when devel¬ 
oping MADNESS was quantum chemistry, where electronic structure problems for 
atoms and molecules are solved using the density functional theory (DFT) or the 
Hartree-Fock methods [25, 26, 51]. Either of these approaches requires solving a set 
of nonlinear partial differential equations, which, for time-independent systems, are 
typically cast into an eigenvalue problem, 

(-\v 2 + V{xYjiP{x) = Eil){x). (3.2) 

V(x) is the potential energy for an electron at position x and is itself dependent on 
ip{x) (thus the nonlinearity). The eigenvalues are interpreted as energies of molecular 
states and the eigenfunctions as molecular orbitals. 

The standard approach for solving Eq. (3.2) expands ip(x) in a predefined, fixed 
set of basis functions, thereby converting Eq. (3.2) into a generalized matrix equation. 
Our MADNESS implementation follows a different approach [34] to utilize the com¬ 
putational strengths of MADNESS and to avoid the numerical problems associated 
with derivative operators on deeply refined meshes. We recast Eq. (3.2) as an integral 
equation, 

ip(x) = -2 [ d£(-V 2 - 2E)~ 1 V(x)'ijj(x), (3.3) 


which can be solved as a fixed point iteration to the lowest eigenvalue. The integral 
kernel in Eq. (3.3) (which assumes free-space boundary conditions), 


G(x, x) 


1 

4-7T \x — x’\ 


(3.4) 


not only incorporates the correct long-range asymptotic form for bound-state (i.e., 
E < 0) molecular orbitals, but it can be efficiently applied using the separated rep¬ 
resentations discussed in §2.3. Capabilities of the molecular electronic structure code 
include energies for a wide range of functionals (including hybrids) and for many-body 
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methods (second-order M0ller-Plesset in six dimensions [14, 15] and configuration in¬ 
teraction [35]), localized orbitals for reduced scaling, forces [50], solvation [22], and 
linear-response for excited states and dynamic polarizabilities [44, 52]. 

3.3. Nuclear physics. Nuclear physics is another successful application for 
MADNESS [21, 36, 37, 38]. Nuclear density functional theory (NDFT) extends the 
molecular DFT above to describe the complex superfluid many-fermion and boson sys¬ 
tem. A critical difference of NDFT is the use of two-component and four-component 
complex orbitals with particle spins and the appearance of resonance and continuum 
states. For example, the two-component complex wave-functions, u and v, and an as¬ 
sociated pairing potential, A, extend the one-component orbital (i/j) in the molecular 
DFT to NDFT. These modifications ultimately result in the Hartree-Fock-Bogoliubov 
(HFB) equations, 


hf — Af 

A* 


A 

h ~ h. 


Ui 

Vi 



(3.5) 


where Ajq is a single particle Bamiltonian for the given component and A-jq is the 
chemical potential for that component. For the ultracold fermionic gas in the BEC- 
BCS (Bose-Einstein condensate - Bardeen-Cooper-Schriffer) crossover simulation, the 
oscillatory Fulde-Ferrcll-Larkin-Ovchinnikov phase was found to have oscillating pair¬ 
ing gaps such that 


% = -Jv-[Va M (r)]+%(r). (3.6) 

a-j-j. is the kinetic energy density and Ujq is the trapping potential. The BFB equa¬ 
tions are solved with a complex version of the scheme used by the molecular density 
functional application with additional geometric constraints; Eq. (3.5) is transformed 
into a coupled set of integral equations, to which Eq. (3.4) is applied. 

3.4. Atomic and molecular physics. The above applications are all time- 
independent (or introduce time-dependence only through response theory). In [49] 
MADNESS was used by physicists interested in the effects of non-perturbative, few- 
cycle laser pulses on atoms and molecules. This research prompted us to develop 
numerically robust and accurate techniques for evolving quantum wavepackets (and 
more general systems of partial differential and integral equations) forward in time. 
The familiar challenges of time evolution are exacerbated by adaptive refinement and 
truncation of the multi-wavelet basis. Basis truncation introduces high-frequency 
noise, as do the inevitable discontinuities between adjacent sub-domains. This noise 
must be removed from the simulation for accuracy and efficiency. Unfortunately, 
we cannot casually accept that frequencies beyond some band limit (cut-off) will 
be propagated inaccurately; for example, the wave function of the time-dependent 
Schrodinger equation (TDSE) depends upon accurate phase information. Our solu¬ 
tion was to choose a band limit, propagate exactly below the band limit, and filter 
frequencies above it — essentially as if we were computing on a globally very fine, 
uniform mesh but still retaining the advantages of adaptive refinement. 

The TDSE was solved [49] for hydrogenic atoms and other systems subjected to 
a strong, attosecond laser field, and the photoionization cross section and other ob¬ 
servables were computed. The electron behavior was modeled by a three-dimensional 
wave function, (af, £), which, within the dipole approximation and in atomic units, 
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MADNESS Applications: Chemistry, Physics, Nuclear, ... 



MADNESS Math and Numerics 


MADNESS Parallel Runtime 


MPI 

TBB 

BLAS/LAPACK 

Elemental 


Fig. 4.1. The library structure of MADNESS and its dependencies. Each of these sub compo¬ 
nents has its own public application programing interface (API) that may be used both for develop¬ 
ment inside MADNESS as well as in external software packages. 


evolves according to the TDSE, 

= (^ y2 + y ( f ) + i ^‘ f ) ( 3 - 7 ) 

where E is the electric field and V the atomic potential. The initial state, 0), is 
the ground state; that is, the solution of Eq. (3.2) with the smallest E. 

The most efficient evolution was performed with a fourth-order accurate, gradient- 
corrected exponential propagator [19] that is only slightly more expensive than the 
second-order accurate Trotter approximation. Crucial for efficient and accurate ap¬ 
plication was the realization that, while the kernel of the free-space propagator, 

(27rit) -1 / 2 exp(— \x\ 2 /2it), 

is infinite in extent and highly oscillatory, its combination with a projector onto a 
finite bandlimit is compact and also bandlimited. 

4. Computational Framework: The MADNESS Parallel Runtime. The 

MADNESS parallel runtime, like the numerical libraries of MADNESS, is an intuitive 
interface that allows the user to compose algorithms as a dynamically scheduled set 
of tasks that operate on objects in global namespaces. This high-level approach to 
parallel programming offers greater composability than that of the traditional Message 
Passing Interface (MPI) and explicit thread programming (POSIX or C+-(- threads). 
The key runtime features include the use of 

1. global namespaces for building applications that are composed of (global) 
objects interacting via remote methods; 

2. tasks as first-class entities for fine-grained work decomposition; 

3. futures [8] for expressing dependencies between the tasks. 

These features permit the programmer to deal less with the explicit low-level details 
of computation and data partitioning ( e.g ., message passing between processes with 
disjoint data or threads operating on shared data) and focus more on high-level con¬ 
cepts central to the scientific domain of interest. An algorithm properly composed 
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within the MADNESS runtime will (partially) hide costs of data communication and 
be tolerant to load imbalance due to dynamic, data-driven work scheduling. 

Many of the individual concepts appear in other packages, for example, in Cilk 
[16, 17], Charm++ [31, 32, 33], Intel Threading Building Blocks (TBB) [3], and other 
projects including ACE [42, 43] and PGAS languages [53]. Some of these features 
are starting to make their way into mainstream programming languages. C++, for 
example, has included task-based concurrency with asynchronous function calls and 
futures since 2011 (note that MADNESS futures, unlike C++ futures, can refer not 
only to results of local tasks, but also to those of remote tasks). 

For the sake of portability the MADNESS parallel runtime is implemented in C++ 
(2011 standard), MPI-1 and POSIX threads; all non-portable software prerequisites, 
such as Intel TBB, are optional. In the most common scenario, each node executes a 
single MPI process composed of the main application thread. The MADNESS runtime 
spawns a remote method invocation (RMI) thread and a pool of computational threads 
managed by a task queue; these threads are not directly accessible to the user. The 
main application thread runs sequentially through the main program and submits 
tasks (that can themselves generate more tasks) to the local and/or remote task 
queue. 1 The output of each task is encapsulated in a future that can serve as input to 
other tasks. Once its input futures are available (assigned), the task queue schedules 
the tasks for execution. Because MADNESS futures live in a global namespace, 
assigning a future can involve data movement to a remote process; this process is 
facilitated by the RMI thread in the receiving process. When available, MADNESS 
can use the task queue provided by the Intel TBB library; MADNESS defaults to its 
own task queue implemented in terms of POSIX threads. 

By supporting the straightforward composition of a distributed task-based appli¬ 
cation, the MADNESS runtime allows programmers to focus on exposing concurrency 
by decomposing data and computation and/or by optimizing load balance. The low- 
level details of thread scheduling and synchronization, data movement, and commu¬ 
nication hiding are automated. However, the MADNESS runtime provides access to 
enough low-level details of the architecture ( e.g ., process rank) that the placement of 
data and computation can be directly controlled by the user. This allows tight orches¬ 
tration of algorithms with complex data flows, such as in the systolic loops used for 
computing localized electronic states in molecular electronic structure. These abil¬ 
ities, as provided by the MADNESS runtime, are also used by TiledArray [18] (a 
framework for block-sparse tensor computations) to hide communication costs and 
withstand load imbalances in handling block-sparse data. 

In addition to the above core capabilities, the MADNESS parallel runtime also 
provides higher-level constructs built on these features. One such feature is a distributed- 
memory, associative container used to store the functions and operators in the MAD¬ 
NESS numerical library. This container automates the placement and computation 
of distributed data. Other features include parallel algorithms ( e.g ., parallel for-each) 
and task-based collective operations. Many of these components interoperate via fu¬ 
tures and, therefore, fit naturally within the task-based framework. For example, 
the task-based all-reduce algorithm, which is analogous to the MPI_Allreduce func¬ 
tion, uses futures as both the input and output of the function, allowing a seamless 
connection between local computation tasks, collective communication, and remote- 
computation tasks. 


1 The main thread can also act as a computational thread by executing tasks in the task queue 
while waiting for an event (e.g., while waiting inside a barrier). 
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5. Summary. Since the introduction and initial successful science application of 
MADNESS, it received an R&D 100 Award in 2011 [5], and there have been at least 
three independent implementations of the associated numerical methods in Japan 
[45], Norway [23], and the United States [47]. MADNESS itself is now thriving as an 
open-source, community project with financial support from multiple sources. Central 
to people’s interest is the emphasis on a high-level of composition while maintaining 
guarantees of accuracy and speed. It is not that MADNESS is necessarily the fastest 
code for any particular problem, but rather that a user working on tractable problems 
will get the right answer with modest effort and without having to place unnecessary 
emphasis on computational details. However, the caveat of “tractability” is non¬ 
trivial — user attention is still needed to regularize some singularities, and the fast 
algorithms within MADNESS (notably the integral convolution kernels) do not yet 
include scattering operators, which is a subject of current research. Similarly, efficient 
computation is presently limited to simple domains and, as shown in [39], cumbersome 
techniques are presently necessary to accommodate even simple interior boundary 
conditions. Nevertheless, the numerical techniques have demonstrated their worth 
in a broad range of physically interesting applications. Similarly, the MADNESS 
parallel runtime is being successfuly used for petascale computations independent of 
the numerical layer [18], illustrating the power and utility of the massively threaded, 
task-based approach to computation. 
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